Skip to content

Conversation

@pwilkin
Copy link
Collaborator

@pwilkin pwilkin commented Jul 13, 2025

[x] I have no idea what I'm doing

This is my first attempt at adding a new arch and I am very much out of my depth, so I would really appreciate if someone took a look at it and verified if it even made any sense. I basically asked Gemini to make a patch based on the existing vLLM / Chatllm.cpp implementations, then tackled some of the conversion logic myself so that it actually generates a GGUF file with all the layers.

Would close #14465

@github-actions github-actions bot added the python python script changes label Jul 13, 2025
@pwilkin pwilkin marked this pull request as draft July 13, 2025 10:44
@pwilkin pwilkin marked this pull request as ready for review July 13, 2025 18:03
@pwilkin
Copy link
Collaborator Author

pwilkin commented Jul 13, 2025

All right, I made a Q4_0 model and got a coherent response, so I guess this somewhat works. I'm upgrading this from draft status, maybe someone can take a look.

@pwilkin
Copy link
Collaborator Author

pwilkin commented Jul 14, 2025

A sample quant for this model has been uploaded here: https://huggingface.co/ilintar/ERNIE-4.5-21B-A3B-PT-gguf

pwilkin and others added 2 commits July 14, 2025 14:08
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@pwilkin
Copy link
Collaborator Author

pwilkin commented Jul 14, 2025

@CISC all right, think that should be all the fixes.

@pwilkin pwilkin requested a review from CISC July 14, 2025 12:57
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@pwilkin pwilkin requested a review from CISC July 14, 2025 19:16
@theo77186
Copy link

I'm testing this branch, while testing speculative decoding, it seems it caused a regression loading the dense 300M model.

Logs loading the dense model only
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
  Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
build: 5896 (542f36bbb) with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu
system info: n_threads = 16, n_threads_batch = 16, total_threads = 32

system_info: n_threads = 16 (n_threads_batch = 16) / 32 | CUDA : ARCHS = 860,890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

main: binding port with default address family
main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 31
main: loading model
srv    load_model: loading model 'ERNIE-4.5-0.3B-f16.gguf'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15216 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 3060) - 11822 MiB free
llama_model_loader: loaded meta data with 30 key-value pairs and 164 tensors from ERNIE-4.5-0.3B-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = ernie4_5
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = ERNIE 4.5 0.3B PT
llama_model_loader: - kv   3:                           general.finetune str              = PT
llama_model_loader: - kv   4:                           general.basename str              = ERNIE-4.5
llama_model_loader: - kv   5:                         general.size_label str              = 0.3B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                               general.tags arr[str,2]       = ["ERNIE4.5", "text-generation"]
llama_model_loader: - kv   8:                          general.languages arr[str,2]       = ["en", "zh"]
llama_model_loader: - kv   9:                       ernie4_5.block_count u32              = 18
llama_model_loader: - kv  10:                    ernie4_5.context_length u32              = 131072
llama_model_loader: - kv  11:                  ernie4_5.embedding_length u32              = 1024
llama_model_loader: - kv  12:               ernie4_5.feed_forward_length u32              = 3072
llama_model_loader: - kv  13:              ernie4_5.attention.head_count u32              = 16
llama_model_loader: - kv  14:           ernie4_5.attention.head_count_kv u32              = 2
llama_model_loader: - kv  15:                    ernie4_5.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  16:  ernie4_5.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:              ernie4_5.attention.key_length u32              = 128
llama_model_loader: - kv  18:            ernie4_5.attention.value_length u32              = 128
llama_model_loader: - kv  19:                          general.file_type u32              = 1
llama_model_loader: - kv  20:               general.quantization_version u32              = 2
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,103424]  = ["<unk>", "<s>", "</s>", "0", "1", "2...
llama_model_loader: - kv  24:                      tokenizer.ggml.scores arr[f32,103424]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  25:                  tokenizer.ggml.token_type arr[i32,103424]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  28:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  29:                    tokenizer.chat_template str              = {%- if not add_generation_prompt is d...
llama_model_loader: - type  f32:   37 tensors
llama_model_loader: - type  f16:  127 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 688.14 MiB (16.00 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 1012
load: token to piece cache size = 0.5907 MB
print_info: arch             = ernie4_5
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 1024
print_info: n_layer          = 18
print_info: n_head           = 16
print_info: n_head_kv        = 2
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 256
print_info: n_embd_v_gqa     = 256
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 3072
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: model type       = 0.3B
print_info: model params     = 360.75 M
print_info: general.name     = ERNIE 4.5 0.3B PT
print_info: vocab type       = SPM
print_info: n_vocab          = 103424
print_info: n_merges         = 0
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 0 '<unk>'
print_info: PAD token        = 0 '<unk>'
print_info: LF token         = 23 '<0x0A>'
print_info: EOG token        = 2 '</s>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: missing tensor '__missing__'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'ERNIE-4.5-0.3B-f16.gguf'

It seems to complain about missing tensors, which doesn't happen on master.

@CISC
Copy link
Collaborator

CISC commented Jul 14, 2025

I'm testing this branch, while testing speculative decoding, it seems it caused a regression loading the dense 300M model.

Yep, it's broken for all dense models right now, will suggest a fix. :)

@Mushoz
Copy link

Mushoz commented Jul 14, 2025

My apologies for this slightly offtopic post, but with the introduction of Ernie 4.5 a new quantization algorithm was also introduced with supposedly SOTA performance at 2 bit. Is that something that will also be incorporated into llama.cpp?

@pwilkin
Copy link
Collaborator Author

pwilkin commented Jul 14, 2025

Dropping this in here for future reference (this is the only reference implementation of the VL part so far, from what I can tell):

https://github.com/PaddlePaddle/FastDeploy/tree/develop/fastdeploy/model_executor/models/ernie4_5_vl

@ThiloteE
Copy link
Contributor

ThiloteE commented Jul 15, 2025

I noticed (because I wanted to try this branch), you are trying to merge from "master" of your fork into "master" of ggml-org/llama.cpp. Is this accepted practice or is creating a separate branch a requirement for merging into llama.cpp?

@CISC
Copy link
Collaborator

CISC commented Jul 15, 2025

I noticed (because I wanted to try this branch), you are trying to merge from "master" of your fork into "master" of ggml-org/llama.cpp. Is this accepted practice or is creating a separate branch a requirement for merging into llama.cpp?

It's acceptable, but not recommended.

@pwilkin
Copy link
Collaborator Author

pwilkin commented Jul 15, 2025

Yeah, it's generally not a great idea because if there are conflicts and you have to merge upstream changes then you have no master branch locally to easily pull them to, I just realized too late that I forgot to make a branch :>

@ThiloteE
Copy link
Contributor

ThiloteE commented Jul 15, 2025

I did try https://huggingface.co/ilintar/ERNIE-4.5-21B-A3B-PT-gguf/blob/main/baidu-ERNIE-4.5-21B-A3B-PT-iq3_M.gguf on my rtx 3060 12GB with cuda 12.9.

Command: llama-server --port 8080 --jinja -fa -c 8192 -ngl 29 -t 6

something is wrong with beginning of sentence (bos) or end of sentence (eos).

image

but not always

image

For some reason the default tokenizer-config.json holds a jinja template that sets the csl token. I suppose at Baidu, they are having an app or downstream application that can make use of that somehow, but maybe for llama.cpp we can add a default template that works out of the box and filters those out. I would need to do some experiments to fix this and I don't have time for that in the coming days, unfortunately. Over and out for now.

image

@CISC
Copy link
Collaborator

CISC commented Jul 16, 2025

@ThiloteE It looks like this is an error in the model config, they have not put the cls/sep tokens in the added_tokens mapping, that's probably why they are hardcoded in the chat template, however since they then are not marked as special tokens they are printed as-is and probably tokenized incorrectly.

There's not much we can do here though, this needs to be fixed by Baidu.

The cls/sep tokens will also normally only be used for WPM tokenization, so it's quite possible that Ernie is broken in general.

@a31413510
Copy link

@ThiloteE It looks like this is an error in the model config, they have not put the cls/sep tokens in the added_tokens mapping, that's probably why they are hardcoded in the chat template, however since they then are not marked as special tokens they are printed as-is and probably tokenized incorrectly.

There's not much we can do here though, this needs to be fixed by Baidu.

The cls/sep tokens will also normally only be used for WPM tokenization, so it's quite possible that Ernie is broken in general.

What problems might this cause?

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

What problems might this cause?

Incorrect tokenization and incorrect BOS/CLS and/or EOS/SEP will cause the model to respond differently, quite often badly, to prompts.

Ernie uses SPM tokenization, which means it will add a BOS (<s>) token by default but no EOS (</s>), if the chat template is to be believed this is incorrect and it should instead add a CLS (<|begin_of_sentence|>) token and separate conversation fragments with SEP (<|end_of_sentence|>).

In effect this means that you have to use --jinja and chat completion (that way no BOS is added) with Ernie models (this was really the case already as the model was added before launch without built-in chat template support), but depending on how CLS/SEP is tokenized (I haven't checked) and whether the model was trained on that same tokenization or not, its responses might not be very good.

@CISC CISC merged commit cb887f1 into ggml-org:master Jul 17, 2025
50 checks passed
@nicoboss
Copy link
Contributor

Do you need help testing the 300B model?

@pwilkin I just tested the 300B model on latest commit. It unfortunately fails the load due to missing tensor 'blk.3.ffn_gate_shexp.weight'. Do you have any idea how to fix this? This error does occur before llama.cpp even loads the model into memory so it should be no problem to reproduce it on your side even if you don't have the resources necessary to actually load it. If there is anything I can do to help you debug this please let me know. Here the full log:

root@AI:/apool/llama.cpp/build/bin# ./llama-cli -m /bpool/ERNIE-4.5-300B-A47B-PT.gguf -ngl 0 -c 7000                                                                                 ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
  Device 1: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
build: 5937 (075ffdcd) with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4090) - 23686 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 4090) - 23689 MiB free
llama_model_loader: loaded meta data with 34 key-value pairs and 591 tensors from /bpool/ERNIE-4.5-300B-A47B-PT.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = ernie4_5-moe
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = ERNIE 4.5 300B A47B PT
llama_model_loader: - kv   3:                           general.finetune str              = PT
llama_model_loader: - kv   4:                           general.basename str              = ERNIE-4.5
llama_model_loader: - kv   5:                         general.size_label str              = 300B-A47B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                               general.tags arr[str,2]       = ["ERNIE4.5", "text-generation"]
llama_model_loader: - kv   8:                          general.languages arr[str,2]       = ["en", "zh"]
llama_model_loader: - kv   9:                   ernie4_5-moe.block_count u32              = 54
llama_model_loader: - kv  10:                ernie4_5-moe.context_length u32              = 131072
llama_model_loader: - kv  11:              ernie4_5-moe.embedding_length u32              = 8192
llama_model_loader: - kv  12:           ernie4_5-moe.feed_forward_length u32              = 28672
llama_model_loader: - kv  13:          ernie4_5-moe.attention.head_count u32              = 64
llama_model_loader: - kv  14:       ernie4_5-moe.attention.head_count_kv u32              = 8
llama_model_loader: - kv  15:                ernie4_5-moe.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  16: ernie4_5-moe.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:                          general.file_type u32              = 1
llama_model_loader: - kv  18:                  ernie4_5-moe.expert_count u32              = 64
llama_model_loader: - kv  19:             ernie4_5-moe.expert_used_count u32              = 8
llama_model_loader: - kv  20:     ernie4_5-moe.interleave_moe_layer_step u32              = 1
llama_model_loader: - kv  21:     ernie4_5-moe.leading_dense_block_count u32              = 3
llama_model_loader: - kv  22:    ernie4_5-moe.expert_feed_forward_length u32              = 3584
llama_model_loader: - kv  23: ernie4_5-moe.expert_shared_feed_forward_length u32              = 3584
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - kv  25:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  26:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  27:                      tokenizer.ggml.tokens arr[str,103424]  = ["<unk>", "<s>", "</s>", "0", "1", "2...
llama_model_loader: - kv  28:                      tokenizer.ggml.scores arr[f32,103424]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  29:                  tokenizer.ggml.token_type arr[i32,103424]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  31:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  33:                    tokenizer.chat_template str              = {%- if not add_generation_prompt is d...
llama_model_loader: - type  f32:  211 tensors
llama_model_loader: - type  f16:  380 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 557.88 GiB (16.00 BPW)
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 1012
load: token to piece cache size = 0.5907 MB
print_info: arch             = ernie4_5-moe
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 8192
print_info: n_layer          = 54
print_info: n_head           = 64
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 8
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 28672
print_info: n_expert         = 64
print_info: n_expert_used    = 8
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: model type       = 300B.A47B
print_info: model params     = 299.48 B
print_info: general.name     = ERNIE 4.5 300B A47B PT
print_info: vocab type       = SPM
print_info: n_vocab          = 103424
print_info: n_merges         = 0
print_info: BOS token        = 1 '<s>'
print_info: EOS token        = 2 '</s>'
print_info: UNK token        = 0 '<unk>'
print_info: PAD token        = 0 '<unk>'
print_info: LF token         = 23 '<0x0A>'
print_info: EOG token        = 2 '</s>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: missing tensor 'blk.3.ffn_gate_shexp.weight'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/bpool/ERNIE-4.5-300B-A47B-PT.gguf'
main: error: unable to load model

In case it helps here the convert_hf_to_gguf.py output:

root@AI:/apool/llama.cpp# venv/bin/python convert_hf_to_gguf.py --outfile /bpool/ERNIE-4.5-300B-A47B-PT.gguf /bpool/ERNIE-4.5-300B-A47B-PT
INFO:hf-to-gguf:Loading model: ERNIE-4.5-300B-A47B-PT
WARNING:hf-to-gguf:Failed to load model config from /bpool/ERNIE-4.5-300B-A47B-PT: Loading /bpool/ERNIE-4.5-300B-A47B-PT requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: Ernie4_5_MoeForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /bpool/ERNIE-4.5-300B-A47B-PT: Loading /bpool/ERNIE-4.5-300B-A47B-PT requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00123.safetensors'
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> F16, shape = {8192, 103424}
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.bfloat16 --> F16, shape = {28672, 8192}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,        torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.0.ffn_up.weight,          torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.0.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.0.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.0.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.0.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.1.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.1.ffn_down.weight,        torch.bfloat16 --> F16, shape = {28672, 8192}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,        torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.1.ffn_up.weight,          torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.1.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00123.safetensors'
INFO:hf-to-gguf:blk.1.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.1.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.1.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.2.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.2.ffn_down.weight,        torch.bfloat16 --> F16, shape = {28672, 8192}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,        torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.2.ffn_up.weight,          torch.bfloat16 --> F16, shape = {8192, 28672}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.2.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.2.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.2.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.2.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.3.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00003-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00004-of-00123.safetensors'
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.3.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.3.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.3.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.3.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.3.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.4.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00005-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00006-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00007-of-00123.safetensors'
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.4.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.4.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.4.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.4.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.4.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00008-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00009-of-00123.safetensors'
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.5.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.5.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.5.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.5.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00010-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00011-of-00123.safetensors'
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.6.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.6.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.6.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.6.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.6.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.7.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00012-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00013-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00014-of-00123.safetensors'
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.7.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.7.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.7.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.7.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.7.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.8.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00015-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00016-of-00123.safetensors'
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.8.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.8.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.8.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.8.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.9.attn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00017-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00018-of-00123.safetensors'
INFO:hf-to-gguf:blk.10.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight,    torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.9.exp_probs_b.bias,       torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.9.attn_k.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.9.attn_output.weight,     torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.9.attn_q.weight,          torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.9.attn_v.weight,          torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00019-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00020-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00021-of-00123.safetensors'
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.10.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.10.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.10.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.10.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.10.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.11.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00022-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00023-of-00123.safetensors'
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.11.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.11.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.11.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.11.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.12.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00024-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00025-of-00123.safetensors'
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.12.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.12.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.12.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.12.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.12.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.13.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00026-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00027-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00028-of-00123.safetensors'
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.13.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.13.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.13.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.13.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.13.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.14.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00029-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00030-of-00123.safetensors'
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.14.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.14.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.14.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.14.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.14.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.15.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00031-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00032-of-00123.safetensors'
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.15.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.15.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00033-of-00123.safetensors'
INFO:hf-to-gguf:blk.15.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.15.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.15.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.16.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00034-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00035-of-00123.safetensors'
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.16.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.16.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.16.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.16.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.16.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.17.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00036-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00037-of-00123.safetensors'
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.17.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.17.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.17.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.17.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00038-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00039-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00040-of-00123.safetensors'
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.18.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.18.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.18.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.18.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.18.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.19.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00041-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00042-of-00123.safetensors'
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.19.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.19.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.19.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.19.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.19.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.20.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00043-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00044-of-00123.safetensors'
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.20.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.20.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.20.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.20.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.20.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.21.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00045-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00046-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00047-of-00123.safetensors'
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.21.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.21.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.21.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.21.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.21.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.22.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00048-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00049-of-00123.safetensors'
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.22.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.22.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.22.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.22.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.22.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.23.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00050-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00051-of-00123.safetensors'
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.23.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.23.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.23.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.23.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.24.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00052-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00053-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00054-of-00123.safetensors'
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.24.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.24.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.24.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.24.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.24.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.25.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00055-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00056-of-00123.safetensors'
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.25.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.25.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.25.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.25.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.25.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.26.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00057-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00058-of-00123.safetensors'
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.26.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.26.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.26.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.26.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.26.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.27.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00059-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00060-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00061-of-00123.safetensors'
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.27.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.27.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.27.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.27.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.27.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.28.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00062-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00063-of-00123.safetensors'
INFO:hf-to-gguf:blk.28.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.28.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.28.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.28.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.28.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.28.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.28.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.28.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.28.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.29.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00064-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00065-of-00123.safetensors'
INFO:hf-to-gguf:blk.29.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.29.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.29.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.29.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.29.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.29.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00066-of-00123.safetensors'
INFO:hf-to-gguf:blk.29.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.29.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.29.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.30.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00067-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00068-of-00123.safetensors'
INFO:hf-to-gguf:blk.30.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.30.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.30.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.30.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.30.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.30.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.30.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.30.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.30.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.31.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00069-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00070-of-00123.safetensors'
INFO:hf-to-gguf:blk.31.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.31.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.31.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.31.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.31.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.31.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.31.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.31.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.31.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.32.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00071-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00072-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00073-of-00123.safetensors'
INFO:hf-to-gguf:blk.32.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.32.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.32.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.32.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.32.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.32.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.32.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.32.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.32.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.32.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.33.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00074-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00075-of-00123.safetensors'
INFO:hf-to-gguf:blk.33.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.33.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.33.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.33.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.33.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.33.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.33.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.33.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.33.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.33.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.34.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00076-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00077-of-00123.safetensors'
INFO:hf-to-gguf:blk.34.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.34.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.34.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.34.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.34.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.34.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.34.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.34.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.34.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.34.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.35.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00078-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00079-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00080-of-00123.safetensors'
INFO:hf-to-gguf:blk.35.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.35.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.35.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.35.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.35.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.35.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.35.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.35.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.35.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.35.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.36.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00081-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00082-of-00123.safetensors'
INFO:hf-to-gguf:blk.36.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.36.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.36.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.36.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.36.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.36.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.36.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.36.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.36.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.36.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.37.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00083-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00084-of-00123.safetensors'
INFO:hf-to-gguf:blk.37.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.37.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.37.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.37.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.37.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.37.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.37.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.37.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.37.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.37.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.38.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00085-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00086-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00087-of-00123.safetensors'
INFO:hf-to-gguf:blk.38.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.38.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.38.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.38.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.38.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.38.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.38.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.38.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.38.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.38.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.39.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00088-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00089-of-00123.safetensors'
INFO:hf-to-gguf:blk.39.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.39.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.39.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.39.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.39.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.39.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.39.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.39.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.39.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.39.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.40.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00090-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00091-of-00123.safetensors'
INFO:hf-to-gguf:blk.40.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.40.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.40.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.40.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.40.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.40.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.40.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.40.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.40.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.40.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.41.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00092-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00093-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00094-of-00123.safetensors'
INFO:hf-to-gguf:blk.41.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.41.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.41.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.41.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.41.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.41.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.41.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.41.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.41.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.41.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.42.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00095-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00096-of-00123.safetensors'
INFO:hf-to-gguf:blk.42.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.42.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.42.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.42.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.42.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.42.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.42.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.42.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.42.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.42.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.43.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00097-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00098-of-00123.safetensors'
INFO:hf-to-gguf:blk.43.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.43.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.43.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.43.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.43.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.43.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.43.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00099-of-00123.safetensors'
INFO:hf-to-gguf:blk.43.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.43.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.43.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.44.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00100-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00101-of-00123.safetensors'
INFO:hf-to-gguf:blk.44.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.44.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.44.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.44.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.44.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.44.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.44.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.44.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.44.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.44.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.45.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00102-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00103-of-00123.safetensors'
INFO:hf-to-gguf:blk.45.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.45.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.45.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.45.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.45.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.45.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.45.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.45.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.45.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.45.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.46.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00104-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00105-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00106-of-00123.safetensors'
INFO:hf-to-gguf:blk.46.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.46.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.46.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.46.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.46.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.46.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.46.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.46.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.46.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.46.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.47.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00107-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00108-of-00123.safetensors'
INFO:hf-to-gguf:blk.47.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.47.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.47.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.47.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.47.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.47.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.47.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.47.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.47.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.47.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.48.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00109-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00110-of-00123.safetensors'
INFO:hf-to-gguf:blk.48.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.48.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.48.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.48.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.48.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.48.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.48.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.48.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.48.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.48.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.49.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00111-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00112-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00113-of-00123.safetensors'
INFO:hf-to-gguf:blk.49.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.49.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.49.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.49.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.49.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.49.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.49.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.49.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.49.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.49.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.50.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00114-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00115-of-00123.safetensors'
INFO:hf-to-gguf:blk.50.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.50.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.50.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.50.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.50.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.50.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.50.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.50.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.50.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.50.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.51.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00116-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00117-of-00123.safetensors'
INFO:hf-to-gguf:blk.51.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.51.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.51.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.51.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.51.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.51.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.51.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.51.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.51.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.51.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.52.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00118-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00119-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00120-of-00123.safetensors'
INFO:hf-to-gguf:blk.52.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.52.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.52.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.52.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.52.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.52.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.52.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.52.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.52.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.52.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.53.attn_norm.weight,      torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00121-of-00123.safetensors'
INFO:hf-to-gguf:gguf: loading model part 'model-00122-of-00123.safetensors'
INFO:hf-to-gguf:blk.53.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.53.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {8192, 3584, 64}
INFO:hf-to-gguf:blk.53.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {3584, 8192, 64}
INFO:hf-to-gguf:blk.53.ffn_gate_inp.weight,   torch.float32 --> F32, shape = {8192, 64}
INFO:hf-to-gguf:blk.53.exp_probs_b.bias,      torch.float32 --> F32, shape = {64, 1}
INFO:hf-to-gguf:blk.53.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:blk.53.attn_k.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:blk.53.attn_output.weight,    torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.53.attn_q.weight,         torch.bfloat16 --> F16, shape = {8192, 8192}
INFO:hf-to-gguf:blk.53.attn_v.weight,         torch.bfloat16 --> F16, shape = {8192, 1024}
INFO:hf-to-gguf:output_norm.weight,           torch.bfloat16 --> F32, shape = {8192}
INFO:hf-to-gguf:gguf: loading model part 'model-00123-of-00123.safetensors'
INFO:hf-to-gguf:output.weight,                torch.bfloat16 --> F16, shape = {8192, 103424}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 8192
INFO:hf-to-gguf:gguf: feed forward length = 28672
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
WARNING:gguf.gguf_writer:Duplicated key name 'ernie4_5-moe.rope.freq_base', overwriting it with new value 500000 of type FLOAT32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting chat_template to {%- if not add_generation_prompt is defined -%}
    {%- set add_generation_prompt = true -%}
{%- endif -%}
{%- if not cls_token is defined -%}
    {%- set cls_token = "<|begin_of_sentence|>" -%}
{%- endif -%}
{%- if not sep_token is defined -%}
    {%- set sep_token = "<|end_of_sentence|>" -%}
{%- endif -%}
{{- cls_token -}}
{%- for message in messages -%}
    {%- if message["role"] == "user" -%}
        {{- "User: " + message["content"] + "
" -}}
    {%- elif message["role"] == "assistant" -%}
        {{- "Assistant: " + message["content"] + sep_token -}}
    {%- elif message["role"] == "system" -%}
        {{- message["content"] + "
" -}}
    {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{- "Assistant: " -}}
{%- endif -%}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/bpool/ERNIE-4.5-300B-A47B-PT.gguf: n_tensors = 591, total_size = 599.0G

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

@pwilkin I just tested the 300B model on latest commit. It unfortunately fails the load due to missing tensor 'blk.3.ffn_gate_shexp.weight'.

LOL, the timing! :D

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

@pwilkin The fix seems simple, just check moe_num_shared_experts before you add_expert_shared_feed_forward_length, make another PR. :)

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

Oh, and add_rope_freq_base can be removed.

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

@pwilkin Looking closer at it I think things are a little more broken, but we can address that when you make the follow up PR.

@pwilkin
Copy link
Collaborator Author

pwilkin commented Jul 17, 2025

There's one more difference in the "big" MoE:

"moe_gate": "topk",

I guess this refers to:

uint32_t expert_gating_func = LLAMA_EXPERT_GATING_FUNC_TYPE_NONE?

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

There's one more difference in the "big" MoE:

"moe_gate": "topk",

I guess this refers to:

uint32_t expert_gating_func = LLAMA_EXPERT_GATING_FUNC_TYPE_NONE?

No, that would be the moe_gate_act (which defaults to softmax).

I think moe_gate is topk only in llama.cpp.
Edit: Yep:

// select experts
ggml_tensor * selected_experts = ggml_top_k(ctx0, selection_probs, n_expert_used); // [n_expert_used, n_tokens]

@fernandaspets
Copy link

Hi thanks for the amazing work! Q: Are there any 300B ggufs up on HF? :P

@pwilkin
Copy link
Collaborator Author

pwilkin commented Jul 18, 2025

@CISC

I think moe_gate is topk only in llama.cpp.

Well, this got relevant pretty quickly.

I tried to get to work on the VL model. Actually, getting the projector converted wasn't that hard. But the normal MoE...

It turns out the VL model (the 28B one) uses something they call a "top2 gate":

https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-PT/blob/main/modeling_ernie_45t_vl.py

In the config, the moe_num_experts is a two-element array, not a list. And it actually means the experts for each layer are a two-element array as well, only flattened for convenience: after each 64 experts of [1 536, 2 560] dimensions come 64 experts of [512, 2 560] dimensions. Apparently, these are supposed to be multimodal (they spell it "multimodel", but I guess they actually meant "multimodal") experts.

But here's where my competence ends - I could of course create a new tensor type to store the "other" experts and even write some logic for storing the weight and weight_1 tensors together and then decoupling them on execution, but I have no clue how to implement this whole "top2 gate" algorithm.

Would love some help with this or some pointers at least (I don't even understand why there are two different feed forward lengths for the two different tensor types).

@CISC
Copy link
Collaborator

CISC commented Jul 18, 2025

I think they actually mean multimodel, as in that's why you have 2 values, one for each model baked into the same tensor.

You can just ignore top2 gating for now, topk probably works fine, anyway, for future reference it is described in the GShard paper.

@pwilkin
Copy link
Collaborator Author

pwilkin commented Jul 18, 2025

You mean just ignore the second layer of tensors? I guess that would just be the 21B-A3B model with a projector then 😄

@CISC
Copy link
Collaborator

CISC commented Jul 18, 2025

No, I just meant top2 vs topk should not cause much issue.

I suspect you will have to read that GShard paper for more info on what's going on with the layers, but it looks like they are combining results from both somehow. Oh, and we have trailing dense layers! :)

@pwilkin
Copy link
Collaborator Author

pwilkin commented Aug 27, 2025

I think they actually mean multimodel, as in that's why you have 2 values, one for each model baked into the same tensor.

You can just ignore top2 gating for now, topk probably works fine, anyway, for future reference it is described in the GShard paper.

It seems they actually meant "multimodal", judging from the vLLM implementation that just landed: https://github.com/vllm-project/vllm/pull/22514/files

The first layer of experts is text, the second layer of experts is video, apparently.

@Downtown-Case
Copy link

Downtown-Case commented Aug 29, 2025

Is there any way the 'architecture' of the 2bit version could be added as well? Looks like its weights are stored differently:

https://huggingface.co/baidu/ERNIE-4.5-300B-A47B-2Bits-Paddle/blob/main/model.safetensors.index.json

https://huggingface.co/baidu/ERNIE-4.5-300B-A47B-Paddle/blob/main/model.safetensors.index.json

I'm not talking about implementing their quantization kernel, of course, but simply unpacking the QAT weights instead of the bf16 ones should result in much better GGUF quantizations, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Ernie4.5MoE support

9 participants